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Abstract 

We study constructions of fc x n matrices A that both (1) satisfy the 
restricted isometry property (RIP) at sparsity s with optimal param- 
eters, and (2) are efficient in the sense that only O(nlogn) operations 
are required to compute Ax given a vector x. Our construction is based 
on repeated application of independent transformations of the form 
DH, where H is a Hadamard or Fourier transform and D is a diagonal 
matrix with random {+1,-1} elements on the diagonal, followed by 
any ky. n matrix of orthonormal rows (e.g. selection of k coordinates). 
We provide guarantees (1) and (2) for a larger regime of parameters 
for which such constructions were previously unknown. Additionally, 
our construction does not suffer from the extra poly-logarithmic factor 
multiplying the number of observations fc as a function of the sparsity 
s, as present in the currently best known RIP estimates for partial ran- 
dom Fourier matrices and other classes of structured random matrices. 



1 Introduction 

The theory of compressive sensing predicts that sparse vectors can be sta- 
bly reconstructed from a small number of linear measurements via efficient 
reconstruction algorithms including -minimization [51 [9]. The restricted 
isometry property (RIP) of the measurement matrix streamlines the analysis 
of various reconstruction algorithms [TJ [H [TOl [T2] . All known matrices that 
satisfy the RIP in the optimal parameter regime (see below for details) are 
based on randomness. Well-known examples include Gaussian and Bernoulli 
matrices where all entries are independent. Unfortunately, such matrices do 
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not possess any structure and therefore no fast matrix-vector multiplica- 
tion algorithm. The latter is important for speed-up of recovery algorithms. 
This article addresses constructions of matrices that satisfy the RIP in the 
optimal parameter regime and have fast matrix-vector multiplication algo- 
rithms. 

A vector x £ C" is said to be s-sparse if the number of nonzero entries 
of X is at most s. A matrix A S ^^x^ satisfies the RIP with respect to 
parameters {s,6) if, for all s-sparse vectors x E C", 

(l-,5)||x||2 < \\Ax\\2 < (1 + 5)||X||2, (1.1) 

where || • II2 denotes the Euclidean norm0 If A satisfies the RIP with pa- 
rameters (2s, (5*) for a suitable 6* < 1 then a variety of recovery algorithms 
reconstruct an s-sparse vector exactly from y = Ax. Moreover, reconstruc- 
tion is stable under passing from sparse to approximately sparse vectors and 
under adding noise on the measurements. The value of 6* depends only on 
the reconstruction algorithm [6l [Ml El EOl H] ■ 

It is well-known by now [H [71 [T7] that a Gaussian random matrix (having 
independent normal distributed entries of variance 1/m) satisfies the RIP 
with parameters {s,6) with probability at least 1 — e~'^^ ^ if 

k > C6-^slog{n/s), 

where c, C > are universal constants. Using lower bounds for Gelfand 
widths of ip-halls for < p < 1, it can be shown that k must be at least 
C5slog(n/s) for the RIP to hold [H]. It can further be shown [12] that 
the constant (as a function of 6) satisfies Cs > C5~'^. Since we will always 
assume in this paper that s < Cn^/^, log(n/s) is equivalent to log(n) up to 
constants. Hence, we will say that a k x n matrix is RIP-optimal at s if it 
satisfies the RIP with (s, 5) for 

y slogn 

(The reader should keep in mind that for large s, RIP optimality should be 
defined to hold when k is at most C(5~^s log(n/s).) 

The restricted isometry property is closely connected to Johnson-Linden- 
strauss embeddings. We say that a random k x n matrix A satisfies the 

^In much of the related hterature, the definition of RIP uses squared Euclidean norms. 
The definition is, however, more convenient for our purposes. Of course, both versions 
are equivalent up to a transformation of the parameter 5. 
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Johnson-Lindenstrauss property (JLP) with parameters {N, 5) if for any set 
X C M" of cardinality at most N (ll.ip holds uniformly for all x G X with 
constant probability. It is well known that a matrix of independently drawn 
subgaussian elements satisfies JLP if /c > C(5~^ log A^. Specializations of this 
fact to Gaussians and to Bernoulli random variables can be found in \1'6\ [T] . 
The general claim is obtainable by noting that the subgaussian property is 
the crux of these proofs. If A satisfies JLP with {N,6) for k < C5~'^ log N, 
then we say that A is JLP-optimal. 

The JLP and RIP properties are known to be almost equivalent, in a 
certain sense. One direction is stated as follows: A (random) k x n matrix 
satisfying JLP with {N,d) satisfies RIP with (C(log A^)/(log n), 5) with 
constant probability [H [T7]. Hence, for any arbitrarily small /i > an 
RIP-optimal matrix can be obtained by drawing a JLP-optimal matrix with 
k = C5~^s log n. The derivation of RIP from JLP is a specialization of JLP 
to a set X consisting of a e-net of s sparse unit vectors, for e = 0.1 say, 
which has cardinality < (Cn)*. 

The other direction is a remarkable recent result by Krahmer and Ward 
[I5] implying that if A has RIP with (s, (5/4), then AD has JLP with (iV, 5) 
as long as < 2*, where D is a diagonal matrix with independent random 
signs (±1) on the diagonal. Notice that from this result, RIP-optimality of 
A does not imply JLP-optimality of AD, because RIP-optimality implies 
that the embedding dimension of k is at least C6~'^slogn, which suffers 
from an additional factor of log n compared to the JLP-optimality guarantee 
bound of C6~'^ log N = C6~'^s (for = 2*). From this observation we 
intuitively conclude that RIP-optimality is weaker than JLP-optimality, and 
hence expect that RIP-optimal constructions should be easier to obtain. 
The main results of this paper, roughly speaking, confirm this by providing 
constructions of RIP-optimal matrices which are simpler than previously 
known constructions that relied on JLP optimality. 

1.1 Known Constructions of RIP-optimal and JLP-optimal 
matrices 

No deterministic RIP-optimal matrix constructions are known. Determin- 
istic constructions of RIP matrices are known only for a grossly suboptimal 
regime of parameters. See [15j for a nice survey of such constructions. 

Of particular interest are RIP or JLP matrices A that are efficient in the 
sense that for any vector x € C", Ax can be computed in time 0(n log n). 
Such constructions are known for JLP-optimal (and hence also RIP-optimal) 
matrices as long as A; < n^/^~'^ for any arbitrarily small fi [3]. This is 
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achieved with the transformation BHD^^^HD^^^H ■ ■ ■ HD^'^\ where D^"^^ are 
independent random sign diagonal matrices, H are Hadamard transforms 
and S is a subsampled (and rescaled) Hadamard transform, where the subset 
of sampled coordinates is related to a carefully constructed dual binary 
code, and r is at most For larger the best efficient constructions 

satisfying RIP are due to Rudelson and Vershynin |22] with k > Cslog^n, 
namely a factor of log^ n away from optimal, see also [20]. (Note that at 
least two of the logn factors can be improved to logs, but in light of the 
aforementioned positive results, we can assume that s is at least, say, v}^^ .) 
This was recently improved by Nelson et al. to only log^ n factors away 
from optimal jl8] . The construction in |22j is a Fourier transform followed 
by a subsampling operator. Another family of RIP almost optimal matrices 
with a fast transform algorithm is that of partial random circulant matrices 
and time-frequency structured random matrices. The best known results in 
that vein have recently appeared in the work of Krahmer, Mendelson and 
Rauhut |14j . where RIP matrices of almost optimal embedding dimension 
k = C slogan are designed. These constructions improve on previous work 
in [21]. 

1.2 Contribution 

In this work we construct RIP-optimal, efficient matrices for the regime 
s < n^/^-^" that are simpler than those implied from the JLP construction 
m [3]. For s < CnV3/log2/3 

n, we show that the transformation SHDHD'H 
is RIP-optimal, where H is a Hadamard or Fourier transform, D and D' are 
independent random sign diagonal matrices and S is an arbitrary determin- 
istic subsampling (and rescaling) matrix of order k. This complements a 
previous RIP result in a similar parameter regime of s < n^/^/ polylog(n) 
implied by a JLP construction by Ailon and Chazelle in [2], in which the 
transformation THD is used, where T is a sparse random matrix with 
nonzero elements. The surprising part about our construction is that S can 
be taken to be an arbitrary (deterministic) subsampling matrix. The ran- 
dom subsampling used in the construction [2] is, in a sense, traded off with 
two more Fourier transforms and one more random diagonal matrix here. 
For s < Cn^/^-^', we show that the transformation PD'^^^HD^^'^H ■ ■ ■ D^'^^H 
is RIP-optimal and efficient, where the D^^^'s are as above, P is an arbitrary 
deterministic matrix with properly normalized, pairwise orthogonal rows 
and r is at most C/ii. This is simpler than the RIP-optimal efficient con- 
struction implied by the aforementioned JLP-optimal efficient construction 
in [3] , because no binary code designs are necessary. 



4 



Our main proof techniques involve concentration inequalities of vector 
valued Rademacher chaos of degree 2. For the second construction an ad- 
ditional bootstrapping argument is applied that, roughly speaking, shows 
that the RIP parameters of ADHD'H are better than those of A. 

2 Notation and Main Results 

Throughout, the letter C denotes a general global constant, whose value may 
change from appearance to appearance. The integer n denotes the ambient 
dimension, k < n denotes the embedding dimension, and C" denotes the 
n dimensional complex space with standard inner product. The usual ip- 
norms are denoted by || • the spectral norm of a matrix A by ||^|| and 
the Frobenius norm as \\A\\f = sj trace(A*^). 

We let H S C"^" denote a fixed matrix with the following properties: 

1. is unitary, 

2. the maximal absolute entry of K is n""*^/^, 

3. the transformation Hx given a vector x G C"' can be computed in time 
0(n log n). 

Both the discrete Fourier matrix and the Walsh-Hadamard matrix are ex- 
amples of such matrices. Note that the upper bound of in property 2. 
above could be replaced by KvT^I"^ for any constant X, thus encompassing 
transformations such as the discrete cosine transform (with K = ^/2) with 
little affect on the guarantees. We have decided to concentrate on the case 
K = 1 for simplicity. 

For any vector z G C", denotes a diagonal matrix with the elements 
of z on the diagonal. For a subset Vt of {1, ... ,n}, let ind(ri) £ denote 
the vector with 1 at coordinates z G and elsewhere. Then define Pq, = 
L'ind{Q) and Rq G R'""^'" to be the map that restricts a vector in to its 
entries in il, i.e., a subsampling operator. 

Recall that a, k x n matrix A has the RIP property with respect to 
parameters {s,6) where s is an integer and (5 > 0, if for any s-sparse unit 
vector X G C", 

1-6 < \\Ax\\2 <l + 6 . 

(Note that we allow 6 > 1, unlike typical definitions of RIP). For a fixed 
sparsity parameter s, we denote by Ss{A) the infimum over all 6 such that 
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A has the RIP property with respect to parameters {s,6). We say that A is 
RIP optimal at a given sparsity parameter s if 



i.(A) < C\/l^ . (2^1) 

A random vector e with independent entries that take the values ±1 with 
equal probability is called a Rademacher vector. 

Our first main result provides a simple RIP-optimal matrix with a fast 
transform algorithm for small sparsities s = C'(?i"'^/'^/ log^^^ 

Theorem 2.1. Let e, e' G {±1}" be two independent Rademacher vectors, 
and let be any subset o/{l, . . . , n} of size k. The kxn random matrix A = 
RqHD^HD^/H is RIP-optimal for the regime k < y^n/s. More precisely, if 

>k> C6-^s log n, (2.2) 

then A satisfies the RIP with probability at least 1 — e"^^^'^. In par- 

ticular, the conditions on k entail 



s < 



log2/3 



n 



Clearly, A = Rq^HD^HD^iHx can be computed in 0{n\ogn) time by 
assumption on H. Also note that (j2.2p implies a restriction on 5 for which 
the result applies, namely 

5 > c'-^. (2.3) 

n 

Our second main result gives a RIP-optimal matrix construction with 
a fast transform for the enlarged parameter regime s = 0{^/n/ \ogn) and 
k = 0{n/ {s\ogn)). 



Theorem 2.2. Assume sq < slogn < k < ^/n and k = C^/ ^^'n'^" < 1/2, 
where sq is a global constant. Let A be an arbitrary kxn matrix satisfying 



- log(2^n./fc) 



AA* = ^Idfc. Let r - ^ 

k log K 

independent Rademacher vectors. Then the kxn matrix 



, and let e(i), . . . , e(r) G denote 



A = AD,,^.HD,i HD,,^ HD,, H ■ ■ ■ D,, ^^.HD,, H (2.4) 

is RIP-optimal with probability at least 0.99, that is, (jl.ip holds if k > 
CS~'^slogn. 

In particular, if we strengthen the constraints by requiring s < n^^'^~^ 
for some global fj- > 0, then A is also computationally efficient in the sense 
that Ax can be computed in time 0{nlogn). 
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The probability 0.99 in the above theorem is arbitrary and can be re- 
placed by any different value in (0, 1). This effects only the constant C. 
However, we remark that the present proof does not seem to give the opti- 
mal dependence of C in terms of the probability bound. 

It is presently not clear whether the restrictions s = 0{'n}/^ / \o^^^ n) 
and s = 0{\/n/ \ogn) in the above theorems can be removed. In any case, 
regimes of small s are the ones of most interest in compressive sensing, 
anyway. 

Section [3] is dedicated to proving the Theorem 12. 1^ and Section H] proves 
Theorem [ 



n] 



3 The regime s = 0{n^/^/ log^/^ 



Our first randomized, computationally efficient, RIP-optimal construction 
involves three applications of H, two random sign diagonal matrices, and a 
choice of an arbitrary set of k coordinates. Fix x € Us '■= {x £ C" : ||x||2 = 
l,x is s-sparse} and let e,e' £ {±1}"' be two random sign vectors. Consider 
the following random variable, indexed by x, 

a{x) = ^\\PnHD,HD,,Hx\\2 , 

where k is the cardinality of 17. It is not hard to see that E[a{xf] = 1. 
Indeed, denoting x = HD^iHx and conditioning on a fixed value of e' , for 
any i G {l,...n}, 

E,\\P{,yHD,xg = \\x\\l/n = l/n . 

The random variable a{x) is the norm of a decoupled Rademacher chaos 
of degree 2. For the sake of notational convenience, we denote, for i,j £ 
{l,...n}, 

xij = ^^PnHP^iyHP^jyHx , (3.1) 
so that we can conveniently write 

n n 

= II J]]^eie^-Xij||2. 

i=i j=i 

By a seminal result of Talagrand [23], a Rademacher chaos concentrates 
around its median. We will exploit the following version. 
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Theorem 3.1. With a double sequence Xij, i,j = 1, . . .n, of vectors in C" 
and two independent Rademacher vectors e, e' G {±1}" let 

n n 

llIZ^^*4^^ill2• 
^=l j=l 

Let Ma be a median of a. For y G C" introduce the n x n matrix By = 
(y*Xjj)"j^j^ and the parameters 

U= sup \\By\\ (3.2) 

y6C",||y||2<l 

V = E sup {\\Bye\\l + \\B;e'\\l)'/\ (3.3) 

yGC",||y||2<l 



Then, for t > 0, 



Pr(|a-M„|>t)<2exp(-Cmin<j —,-[>) . (3.4) 



t^ t 



Proof With A = {^j)2j=i and 



1 / ^\ , /e 



•^"2 1^* oj' ' 



we can rewrite the decoupled chaos as the coupled symmetric chaos 

n 

e Se = ^ ^ eiejXij, 

where matrix multiplication is extended in an obvious way to matrices with 
vector-valued entries. Observe that S has zero diagonal. Therefore, the 
claim follows from Theorem 1.2 in [23]. □ 

We bound the quantity U = U{x) in ()3.2p . where the Xjj are defined by 
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p.ip . Note that ^ aif3jy*Xij = y* HDaHDpHx, and hence 

n n I — 

U = sup y^ y^ ai/3jy*Xij = - sup y*PnHDa*HDi3Hx 

I|y||2,||a||2,|l/3|!2<1~^ V K y^a^/s 



in 

sup a*Dy'.p^HHDHxP 

An 

< ^ - sup \\a\\2 ■ \\Dy*p^H\\ ■ \\Dhi 

V « y,a,/3 
/ 7Z 

< ^^ -sup\\y*PnH\\oo ■ \\Hx\\oo 
n ,, ^ „ ,, „i /o ,, ,, „i /o , /n 



< VsJ^ . (3.5) 
To bound V, we define a process i/(y) as 



z.(y) = ^||i3y6||i + ||i?;e'||2 (3.6) 

so that V = Esup||y||<]^ '^(y)- By the definition of the vectors Xij, it clearly 
suffices to take the supremum on vectors y supported on 0. For any such 
y, let fj.y = Ei/(y). Jensen's inequality yields 

^^,iy) < < ^\\Dy*p^-^HHDHAF . (3.7) 

By definition of the Frobenius norm, together with the fact that the matrix 
elements of H are bounded above by in absolute value, we obtain 

^l,(y)<\\y\\^/2Jk . (3.8) 

We use a concentration bound for vector-valued Rademacher sums (tail in- 
equality (1.9) in [16]) to notice that for any y and t > 0, 

Pr {\v{y) - M,(y)| >t)< 4exp {-Cf/a\^)) , (3.9) 
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where -/Vfjy(y) is a median of i^(y) and, with A = y^n/kPu 



sup 



j=i 



1/2 



< sup \\y*ADpHDHx\\2 + \\Dy*AHD.fHx\\2 

ll/9|Ul7ll<l 

< sup y*ADpHDp,Hx + y*AD^HD^,Hx 

ll/3|Ml7ll<l 

||/3'||,||7'||<1 

< 2 sup y*ADpHD^Hx 

ll/3|Ml7ll<l 

< 2vWfcPrPoH^^^//x|| 

< 2||y||2 • ||x||i/-v/n < 1\J sjn . 



(3.10) 
(3.11) 



Upper bounding the expression (|3.10p was done exactly as above when upper 
bounding U . Using (|3.8p and the second part of Lemma lA.H we conclude 
that 

M^(y) < ^^(y) + Coy < \\y\\y^+ C^flJ^ . (3.12) 

We will now bound V . To that end, we use a general epsilon-net argument. 
Given a subset T of a Euclidean space, we recall that a set A/" C T is called 
/i-separated if ||y — y'||2 > A* for all y,y' G AA, y / y'. It is called maximally 
//-separated if no additional vector can be added to A/" in a /i-separated 
position. 

Lemma 3.2. Lei 7 : C"^ 1-^ R"*" he a seminorm, and letN denote a maximal 
^-separated set of Euclidean unit vectors y € C"* for some fi < 1. Let 

S = sup 7(y) / = inf 7(y) . 



Then 



, ^ 1 
sup 7(y) < 

||y|| = l 1 - 



s 



inf 7(y) > / 
lly||=i 1-/" 

In particular, if k := supygjy- |7(y) — 1|, then 

sup |7(y) - 1| < 

||y|| = l 1-/^ 



5 



(3.13) 
(3.14) 

(3.15) 
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The proof of the bound (j3.13p is contained in [25] . The proof of (j3.14p 
is similar and imphcitly contained in 

We use the lemma by constructing a maximal i] = 0.1-separated set M 
of Sq := {y G C" : ||y||2 = l,suppy C ^l}. Using a standard volumetric 
argument (see e.g. [201 Proposition 10.1]) 

cardAA < (1 + 2/7]f'' = 21^'' . (3.16) 

We also notice that z^(-) is a seminorm for any fixed e, e'. Using (j3.13p . 

sup u{y) < ^ sup z^(y') . 
yy^Pny i — U.i y'eA/" 

lly||=i 

Taking expectations yields 

E sup i/(y) < 1.2E sup z^(y') . (3.17) 
lly||=i 



The expectation on the right hand side can now be bounded, in light of (13.9 
and using Lemma [4.21 (with a- = 0), as follows: 



E sup u{y') < supMj^(y) + sup <7;/(y) \/log card . (3.18) 
y'eAf y y 

Together with (IXT2]) . (IXT6]) . and (IXTTI) . this implies 

E sup u{y') <C ( ^/T/k + JJJ^+ J -k] . (3.19) 
y'GA/- V \ n J 

If we now assume that k < h \/^, then we conclude from (j3.17p , (j3.19p and 



2 V s ' 

(1^ that 

V = E sup i/(y) < C/Vk and U <2/k . 
lly||=i 

Plugging these upper bounds into ()3.4p we conclude that for all < t < 1 

Pr(|a(x) - i\/4(^)| >t)< 2exp {-Ct^k) 

(because min{t^/y^, t/U] = Ct^ jV"^ for < t < 1 and for the derived values 
of [/, V\ But we also know, using Lemma [A. H that | y^Ea(x)^ — Mq,(^)| < 
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C/Vk. Recalling that Ea(x)^ = 1 and combining, we conclude that for 
t < 1, 

FT{\a{x) - 1| > t) < Cexp{-Ct^k} . (3.20) 

Now we fix a support set T C {!,... ,N} of size s and consider the 
complex unit sphere restricted to T, i.e., 5t = {x G C" : ||2;||2 = l,suppx G 
T}. Let N't be a maximal 77-separated set of St, which has cardinality at 
most (1 + 2/r]f' by (fXTHD . By a union bound, we have 

Pr(max \a{x) - 1| > t) < (1 + 2/r]f'Ce-^^^'' 

= Cexp{-Ct'^k + 2slog{l + 2/r])) . 

It follows from Lemma 13.21 that 



Pr(max|Q(x) - 1| > {t + r])/{l - r])) 

X&Us 

= Pr I max max \a{x) — 1| > {t + f]) /(I — t]) 



- Pr(^rnax|a(x)-l| < Qcexp(-Ct2A; + 2slog(l + 2/r/)) 

< exp {-Ct^k + 2s log(l + 2/77) + s log(en/s)) . 



Choosing rj = min{t,0.5) we conclude that maXx^Us \o:{x) — 1| < 4t with 
probability at least 1 — e if 

k > Ct^{s{log{l + 2/t) + log(en/s)) + \og{Ce-^) . 

Replacing t by 5/4 and noting that ()2.3p implies log(l + 8/5) < clog(en/s) 
concludes the proof. 



4 The regime s = 0{^/n/ logn) 

We now use the idea developed in the previous section to bootstrap an 
efficient RIP-optimal construction for a larger regime. 

Let ^ be a fixed kxn matrix with pairwise orthogonal rows of Euclidean 
length ^Jn/k each. Namely, 

ft 

AA* = -ldk. (4.1) 
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The strategy will be to improve the RIP parameter 6s{A) of A by re- 
placing A with A = AD^HD^'H . To analyze the (random) RIP parameter 
Ss{A), fix an s-sparse unit vector x £ C". Now define the random variable 

a{x) = \\Ax\\2. 

As before, we note that a{x) is the norm of a decoupled Rademacher 
chaos of degree 2 in a A;-dimensional Hilbert space, which can be conveniently 
written as a{x) = \\ XlILi Sj=i ^^i^^'j^ijh, where 

x,j- = AHP{^HP{j}Hx . 

As in the previous discussion, we bound the invariants of interest U and 
V as defined in ()3.2p and (IS.Sp . respectively. We start by bounding U . By 
definition, 

U < sup y*AD^HDpHx . (4.2) 

l|y||2,||Q||2,||/3||2<l 

Notice now that since x is s-sparse by assumption, we have < \fs and 
hence ||i?x||oo < sj sjn so that 

||D^i?x||2 < \fsjn and < 1 . 

The right hand side inequality is due to Cauchy-Schwarz. In turn, this 
implies 

\\HDpHx\\ao < l/v^ and \\HDpHx\\2 < \fsjn . 

Therefore, 

\I)^HDfiRx\2 < i/Vn and \\DaHDpHx\\i < ^JTJTi . (4.3) 

Again, the right hand side inequality is due to Cauchy-Schwarz. We need 
the following simple lemma, see also [191 Lemma 3.1]. 

Lemma 4.1. Let w G C" be such that \\w\\i < y/sp and ||it;|| < p for 
some integer s and number p > 0. Then there exist N = [n/s] vectors 
w'^^^ . . . , w^^"^ such that w^*^ is s-sparse for each i, w = X^^Li 'W^''*^ ^'^'^ 

ElM'H2<2p. 
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Proof. Assume wlog that the coordinates wi, . . . , Wn of w are sorted so that 
\wi\ > \w2\ > • • • > \wn\- For i = 1, . . . , iV, let 

i/;« = ( 0^_^ ,«;(i_i),+i,...,«;i„0,...,0)* gC" 
(«-i)s times 

and ai = \\w^^''\\oo = |ti'(i_i)s+i|. Clearly w = X^^i w^^\ and we have: 

TV N 
i=2 1=2 

But now notice that for all i = 1, . . . , A^, > a^+is (where we define 

aN+\ = 0), hence we conclude, using the assumptions, that 

N N 

aiS < ^ 111 = llwlli < y/sp. 

1=2 i=l 

Therefore, X^^Li Oj-v/s < p. Together with (j4.4p . this implies the lemma. □ 

Remark 4.2. Note that the technique of grouping together monotonically 
decreasing coordinates of a vector in blocks is rather standard in compressive 
sensing, see for example [6] or [T5] . 

From (14.31) we conclude that DaHDpHx can be decomposed as Xl^i w^^^ 
as in the lemma with p = n~^/^. Using the RIP assumption on A, for each 
i = 1,. . . ,N, llylw'^*) II < ||tt;^*^||(l + 6). By the lemma's premise, and using 
the triangle inequality, this implies 

U <2{l + 5)/VTi . (4.5) 

Bounding V is done as follows. We define the process i' as 

^{y) = {\\Bye\\l + \\B;e'gf' , 

where (-By)ij = y*Xij, over the set {y G M'^ : ||y|| < 1}, so that V = 
Esupy z^(y). For any y, i^(y) is a Rademacher sum in /c-dimensional Hilbert 
space. Thus, we can use (j3.9p to conclude that for all y, 

Pr (i.(y) > M,(y) +t)< 4exp(-tV8a2(y)) , 
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where M^^^y-j is a median of i^(y) and 



Ay) 



sup 

il/3|P + ll7lP<l 



i=l 



1/2 



< sup ||y*ylL»^iJL>Hx|| + ||^yM^^7^a;|| 

ll/3|Ml7ll<l 

< sup y * ADfi HDp, Hx + y* AD^ HD^, Hx 

Il/3|l.ll7ll<l 
II/3'IM|7'II<1 

< 2 sup y*ADpHD-yHx (4.6) 

ll/3|Ml7ll<l 

< 4(1 + J)/V^. (4.7) 

For the last inequahty, notice that (j4.6|) is bounded by twice the RHS of 
(|4.2p , and recaU the derivation of ()4.5p . Using the first part of Lemma lA.H 
we conclude that for all y such that ||y|| = 1, 

M^(y) < /i^(y) + C{1 + 5)/V^ , (4.8) 

where ^J'v(y) is the expectation of z^(y). Jensen's inequality yields 

^^u{y) < WDH-A-yHDuAF < \\H* A*y\\ ■ \\Hx\\/y/li=\\A*y\\ ■ \\x\\/^ 
< (v/^)/^/^ = . 

Again we notice that for any fixed e,e', is a seminorm. As before, let J\f 
denote a maximal 0.1-separated set of Euclidean unit vectors in C'^. Hence 
by p.lSp in Lemma [321 foi" ^'^Y fixed e, e', 

sup z^(y) < 1.2 sup z^(y) . 

I|y||=i yeA^ 

Taking expectation on both sides and using Lemma lA.21 to bound the right 
hand side (recalling that the cardinality of is at most 21^'"'), we conclude 



y = E sup v{y) < sup A4(y) + C\/k 
lly||=i lly|l=i I 

By (j4.7p and by our bound on M^(y), we conclude 



sup C7^(y). 

i-(y)ll=i 



V < k^^^^ + C{l + 6)/^ + CVk{l + 6)/^ 
< {k~^'^ + (1 + 5)C^/kJ^) . 



(4.9) 
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From ([O]) . (gJD we then conclude that for all t > 0, 

Pr(|a(x)-il/4(,)| >t) 

<2exp|-Cmin| -£= J:/^]] . (4.IO) 

" V \((l + 5)yfc7^+l/^/fc)2'(l + <5)Jy ^ ' 

Using the first part of Lemma [A . 1 1 and recalling that Ka{x)'^ = 1 this implies 
that 

- 1| < C((1 + (5)v^+1/\/A?) , (4.11) 

We now use the net-technique to pass to the supremum over all s-sparse 
unit vectors to provide an estimate of the restricted isometry constant. 
For each subset T C {1,...,?7-} of cardinality s we consider a maximal 
//-separated A/r of the unit sphere St of complex unit length vectors with 
support T where /x = 1/k. By (j3.16p and since k < y^n/s and s < \/n, the 
union M = U^t=s-^t is bounded in size by 



#AA < (1 + 2/77)^" < (en/s)"(l + 2^n/sy' < exp(Cslogn) . (4.12) 



Using Lemma \A.2\ (|4.10p and (j4.1ip . we conclude that 
Esup|a(x)-l| < C{{l + 6)^/k/^+l/Vk) + C^/Tk^{l + 6)^/k/^ 

+C{slogn){l + 6)/^/n 
< cl{l + 6) 



n \/n j J 



We will assume in what follows that 

slogn < k , (4.14) 
so that ()4.13p takes the simpler form 



Esup|a(x)-l| < c((l + <5)J^^^ + A:-V2| . (4.15) 
xeAf y V n J 

Recalling that a is a seminorm and applying (I3.15P in Lemma 13.21 we 
pass to the set of all s-sparse Euclidean unit normed vectors, 

E sup \a{x) — 1| = E max sup \a{x) — 1| 

||a;||o<s 



<c[{l + 5)^r-^ + k-y^\ . (4.16) 
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4.1 A Bootstrapping Argument 

Let S' denote (5s (A). Assume henceforth that the parameters s,k satisfy 



cJ'-^< 1/2. 



n 



(4.17) 



Clearly ()4.15p is a bound on E[5']. With the new notation, we get 
E[l + 8'\ < (1 + (5)k + 1 + CA:-^/2 ^ 

Denote A by A^^'^ and A by ^4^'^). Now consider inductively repeating the 
above process, obtaining (for i > 2) yl^*) from A^"^"^^ by 

where e(i),e(j) are independent copies of e,e'. Let (5^ denote 6s{A^'^). By 
independence and the principle of conditional expectation, we conclude that 

E[l + 5»] < (1 + + , < (1 + JW)/^^ + 2(1 + Ck-^/'') . 

(1 - k) 

Assume in what follows that k is large enough so that C/c~^/2 < i_ Ti^gn 
last inequality conveniently implies E[l + 5^*)] < (1 + 6^'^^)n'^ + 4. Recall by 
our definition of A that (5 = 5^^^ can be no more than ^Jrijk. Let r be taken 
as 



r := 



log(2v/^) 

log K 



(4.18) 



so that (1 + 5(0))k^ < (1 + sJnlk)Kl' < 2y^n/kK'' < 1 and hence 

E[l + ,5(")] < 5 . 

Using Markov inequality, this implies that with probability at least, say, 
0.995 

1 + 5(')< 1000. (4.19) 

Prom now on assume event (|4.19p holds. Now for an s-sparse unit vector x, 
let x^^"^^-* := A^^~^^^x. The assume k < y/n is equivalent to 1/y/k > y^k/n. 
Using this, and substituting a constant for (1 + 5), (|4.10p implies 



Pr(|||x(^+i) 
< 2 exp ( — C min 



2 -M|, (. + !),, I >t) 



(A:-V2)2 



(4.20) 
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where M||^(r+i)||2 is a median of ||2- 

Once again we consider maximal /i-separated sets Mt of St with fj, = 1/k 
for each T C {1, . . . , N} of size s and form Af = U^t=sJ^t- The cardinahty 
of M is at most expjCslogn}, see (|4.12p . We can now use a union bound 
over A/", to conclude that with probability at least 0.995, 



max 



b(''+^)||2-M| 



< C max 



s log n s log n 



n 



(4.21) 



Using (j4.1ip . this implies 



max 



|x(^+l)||2-l 



< C max 



sfc log n s log n 



n 



+ 



n 



< C 



sk log 77, 



77 



(4.22) 



where the last inequality used (j4.14p . As before, using (j3.15p in Lemma 13.21 
allows us to pass to the set of all s-sparse vectors: 



sup 

11^11=1 

||x||o<s 



|x(^+')||2-l 



< c 



sk log 77 



77 



Recalling the assumption k < -^/n, this implies 



and the proof of Theorem 12.21 is concluded. 



(4.23) 



A Properties of Mixed Gaussian and Exponential 
Processes 

Lemma A.l. Assume X is a random variable such that for some number 
M and for all t > 0, 

Pr[|X-M| >t] < Cexp{-min{tVo-?,iM}} , 
for some o"i,(T2 > 0. Then 



1. |(EX2)V2 _ |M|| <C"^/ai + ai 



18 



2. |EX-M| < C"\ai + a2\ 
for some constant C" that depends only on C . 

Proof. For the first part, assume that M > for the moment. By integrating 
and changing variables, 

;>oo 

E(X-M)2 = / Y>v[\X - M\>yft]dt 
Jo 

< C exp {-t/al]dt + C exp|-\/t/CT2}di 

/"oo poo 

= Caf / e~'ds + 2Cai / e-'sds 
Jo Jo 

< {al + a^,)C' (A.l) 

for some constant C" > 0. On the other hand, with = EX^ we have 
E{X - Mf =p'^ + M^ - 2MEX and EX < E|X| < VEXS = p. We hence 
conclude that {p — Mf < (cjf + cjDC". If M < then we simply replace the 
random variable X with —X and M with —M. 

The second part is obtained in the same way by integrating to bound 
E|X - M|. 

□ 

Lemma A.2. Assume that Xi, i = 1, . . . , N are random variables such that 
for each i there exist numbers Mi and (Ji^a^ > such that for all t >0, 

Pr[|X, -M^\>t]< 2exp {-mm{t^/af, t/a-}} . 

Then 

E sup |Xi - Mil < C ( v/log N sup ai + log N sup a'^ ) . (A.2) 

Note that the variables Xj are not required to be independent. The proof 
can be done by integration by parts, very similar to the derivation of (3.6) 
in [16]. 
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